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(54) Face detecting and recognition camera and method 



(57) A method for determining the presence of a 
face from image data includes a face detection algo- 
rithm having two separate algorithmic steps: a first step 
of prescreening image data with a first component of the 
algorithm to find one or more face candidate regions of 
the image based on a comparison between facial shape 
models and facial probabilities assigned to image pixels 
within the region; and a second step of operating on the 
face candidate regions with a second component of the 
algorithm using a pattern matching technique to exam- 
ine each face candidate region of the image and thereby 
confirm a facial presence in the region, whereby the 
combination of these components provides higher per- 
formance in terms of detection levels than either com- 
ponent individually. In a camera implementation, a dig- 
ital camera includes an algorithm memory for storing an 
algorithm comprised of the aforementioned first and 
second components and an electronic processing sec- 
tion for processing the image data together with the al- 
gorithm for determining the presence of one or more fac- 
es in the scene. Facial data indicating the presence of 
faces maybe used to control, e.g., exposure parameters 
of the capture of an image, or to produce processed im- 
age data that relates, e.g., color balance, to the pres- 
ence of faces in the image, or the facial data may be 
stored together with the image data on a storage medi- 
um. 
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Description 

[0001] The present invention is in the field of image capture, and in particular in the field of image processing for the 

purpose of enhancing and optimizing the process of image capture by a camera. 
5 [0002] A preponderance of images collected by photographers contain people, which are often the most important 

subjects of the images. Knowledge of the presence and location of people in an image, and especially the presence 

and location of their faces, could enable many beneficial improvements to be made in the image capture process. 

Some are suggested in the prior art. For example, automatic and semi-automatic focusing cameras often pick a portion 

of the scene on which to adjust for best focus. If the camera could locate the faces in a scene, then focus could be 
10 optimized for the faces unless the photographer explicitly overrides that choice. In U.S. Patent No. 5,835,616 a face 

detection system is used in automated photography to eliminate manual adjustment problems that can result in poor 

quality from lack of focused subjects. 

[0003] Furthermore, detection of the faces in a scene gives very strong evidence of the proper location of the principal 
subject matter, In that connection, the process disclosed in the '616 patent automatically finds a human face in a 

15 digitized image taken by a digital camera, confirms the existence of the face by examining facial features and then has 
the camera automatically center itself on the detected face. Detection of a face also yields strong evidence of proper 
cotor balance for the facial and/or skin area. For example, in U.S. Patent No. 5,430,809 a video camera autonomously 
tracks a facial target in order to set a measuring frame on the facial object for purpose of auto exposure and auto focus. 
In addition, once the measuring frame is set, an auto white balance system adjusts colors to obtain optimal skin color 

20 on the face. As a result, the auto white balance system is said to perform auto skin color balance. It is also known 
(from U.S. Patent No. 5,629,752) to detect a human face and then to utilize data representing color and/or density of 
the facial region to determine an exposure amount such that the region corresponding to the face can be printed 
appropriately by a photographic printer. 

[0004] While face detection has been studied over the past several years in relation to the subject of image under- 
25 standing, it remains an area with impressive computational requirements, particularly if a robust face detection algorithm 
is needed. A number of methods have been devised that show reasonable performance over a range of imaging 
conditions. Such methods may be more successfully implemented in large scale processing equipment, such as pho- 
tographic printers, which have relatively sophisticated processing capability (compared to a hand-held camera). The 
challenge is to implement these face detection methods reasonably in a camera with limited memory resources, and 
30 with low computational cost. If this can be done successfully, the detection of faces in a scene will then serve as a 
springboard to numerous other improvements in the image capture process. In addition, it would be useful to detect 
faces in order to implement downstream activities after image capture, e.g., face detection could provide evidence of 
up/down orientation for subsequent printing (for example, of index prints). 

[0005] It is an object of the invention to capture images and detect one or more of the human faces contained in the 

35 images, for the purposes of adding value to the image capture process and improving quality in the captured image. 
[0006] The present invention is directed to overcoming one or more of the problems set forth above. Briefly summa- 
rized, according to one aspect of the present invention, a method for determining the presence of a face from image 
data includes a face detection algorithm having two separate algorithmic steps: a first step of prescreening image data 
with a first component of the algorithm to find one or more face candidate regions of the image based on a comparison 

40 between facial shape models and facial probabilities assigned to image pixels within the region; and a second step of 
operating on the face candidate regions with a second component of the algorithm using a pattern matching technique 
to examine each face candidate region of the image and thereby confirm a facial presence in the region, whereby the 
combination of these components provides higher performance in terms of detection levels than either component 
individually. In a camera implementation, a digital camera includes an algorithm memory for storing an algorithm com- 

45 prised of the aforementioned first and second components and an electronic processing section for processing the 
image data together with the algorithm for determining the presence of one or more faces in the scene. Facial data 
indicating the presence of faces may be used to control, e.g., exposure parameters of the capture of an image, or to 
produce processed image data that relates, e.g., color balance, to the presence of faces in the image, or the facial 
data may be stored together with the image data on a storage medium. 

so [0007] In another aspect of the invention, a digital camera includes a capture section for capturing an image and 
producing image data; an electronic processing section for processing the image data to determine the presence of 
one or more faces in the scene; face data means associated with the processing section for generating face data 
corresponding to attributes of at least one of the faces in the image; a storage medium for storing the image data; and 
recording means associated with the processing section for recording the face data with the image data on the storage 

55 medium. Such face data corresponds to the location, orientation, scale or pose of at least one of the faces in the image 
[0008] In a further aspect of the invention, a digital camera includes an algorithm memory storing a face detection 
algorithm for determining the presence of one or more faces in the image data and a composition algorithm for sug- 
gesting composition adjustments based on certain predetermined composition principles; and an electronic processing 
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^Z^ZV^Zt^^ll: f ° r d « g «* — - °„e • -re faces 

face data corresponding toSTlSS T^tT" ^ Pr ° CeSSing S6Cti0n ,hen 9 enera,es 

composition suggestions coL sponlg ^0^^ f ""J ^ ? ^ ^ ^ h " fl * " We " 35 
5 [0009] In yet a further asnprt of ih! V , eV,a,, °" of the face data ,rom the predetermined composition principles. 

«> spending to at least one of heTratton an T!J f !f T Pr ° CeSS ' n9 SeCti °" f ° r genera,in 9 face data c °™- 
tHefacedataonfhem^ 

incl ITe invlmSr " * ^ C " n " a " °< — ra e.ements in accord- 

!S32 FIG 3 J » r Ck h di ! 9 , ram ° f ima9e C3p,Ure S6Cti0n of ,he camera sh °«" Figure 1. 
ZZ^XseZT dl39ram ° f Camera ° Perati ° nS iWOlVed in "» —ra shown in Figure 

" nn^a7LU S mo f r hart ° f " ° Pera,i ° nS inV °' Ved h the °" erato " ° f - -era shown in Figure 

p!p ! iS 3 ft ?"' chart Sh0Win9 the 9enerati0n ° f —Position suggestions. 

00 8 FIGS 7A n K ?Sm " im39e T ^ W ° 3 ^ f ° r aPP ' iCafon ° f the ru,e of «*■ 
- ine of' the face de^ecL algonthms 6XamPleS " ** M * ^ ^ ^ P— — 

Eon "tlZT 88 Sh ° W 9faPhiCal diSP ' ayS ° f Pr ° babilrty d6nSi,ieS for ski "' ^ « «" in one of the face 
S^'SZT 1 98 Sh ° W 9raPhiCa ' diSP ' ayS ° f Pr0ba ° ilify d6nSitieS fof hair " Which « — on. of the face 

^e°dLe B dTi e ^ ele rt T iC fi ' m * We " kn0W "' ,he present *«*«»" 

computer program the JZ It! h" , , " "* f3Ce de,eC,i ° n aSpect of the inventi ° n ■ implemented as a 

SSSSSr 24 ,n H c ; ( des an aperture and shutter ,or regu,ating the «pi- of .hi 

include an anato Xg devS' 25 such I a ™ T *? ? e ' eC,r ° niC CBBta » *** the Cap,ure secti °" 

og storage dev.ce 25. such as a conventional photographic film. In the case of well-known APS film. 
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which includes a magnetic recording layer, a recording device 26 can record annotation data regarding the captured 
images on the magnetic layer. A flash unit 27 is also provided for illuminating the image 22 when ambient light .s 

[0024] C ' e The CPU 30 is interconnected via a system bus 40 to a random access memory (RAM) 42, a read-only memory 
(ROM) 44 an input/output (I/O) adapter 46 (for connecting the capture section 20, the digital memory 32, the recording 
unit 26 and the flash 27 to the bus 40), a communication adapter 48 (for connecting directly to an information handling 
system or a data processing network, such as the Internet), a target tracking stage 49 (for generating a measuring 
frame 49a that tracks the faces), a user interface adapter 50 (for connecting user interface devices such as a shutter 
button 52 flash controls 54. programmed exposure selections 56, a user manipulated display cursor 58 and/or other 
user interface devices to the bus 40). a algorithm interface adapter 60 (for connecting various stored algorithms to the 
bus 40. including a face detection algorithm 90) and a display interface 70 (for connecting the bus 40 to the d.splay 
device 34) The CPU 30 is sufficiently powerful and has sufficient attached memory 42 and 44 to perform the face 
detection algorithm 90. A training database 72. connected to the bus 40. contains sufficient training data to enable the 
face detection algorithm 90 to work for a very wide range of imaging conditions. In the preferred embodiment, as will 
be described in detail, the face detection algorithm includes two component algorithms: a first component algorithm 
that estimates a face candidate region of the image based on a comparison between facial shape models and facial 
probabilities assigned to image pixels within the region and a second component algorithm operative on the face 
candidate region using pattern analysis to examine each region of the image and thereby confirm a facial presence in 
the region The advantage of this combination is that the first component algorithm can be designed to operate quickly 
albeit with the potential for false positives and the second component algorithm can restrict its more computationally 
intensive processing to the relatively few regions that have passed the first algorithm. 

[0025] The results of face detection are used to control a number of functions of the camera, which are embodied 
in the algorithms connected to the data bus 40 through the interface adapter 60. The face detection results are tracked 
by the target tracking stage 49. which sets and manipulates the measuring frame 49a to track, e.g., the centroid of one 
or more face locations. The measuring frame is used as described in U.S. Patent No. 5,430,809. which is incorporated 
by reference, to limit the data collected for purposes of autofocus, auto exposure, auto color balance and auto white 
balance to the facial areas. The measuring frame may be a small spot-like area or it may be configured to have borders 
generally coinciding with the borders of one or more faces; in either case it is intended to confine the data collected 
for the algorithms to face data or some sample thereof. These algorithms include a red eye correction algorithm 80, 
an exposure control algorithm 82. a flash control algorithm 84, a focus control algorithm 86, a color balance algorithm 
88 and a composition algorithm 92. The red eye correction algorithm 80 adjusts the stored digital pixel values to remove 
a red eye condition produced by the flash unit 27. The exposure control algorithm 82 determines settings from the 
measuring frame 49a for the image exposure control mechanism 24 in the image capture section 20 of the camera so 
that the faces are properly exposed. In conjunction with the exposure control determination, the flash algorithm 84 
determines whether or not to fire the flash for optimal capture of the facial images. The camera utilizes the focus 
algorithm 86 to derive distance data from the measuring frame 49a and to set a pointable focus mechanism in the 
optical section 21 using the results of a framing image so that a final captured image is properly focused on the face 
regions. The color balance algorithm 88 is applied to the digital image file in order to optimize the representation of the 
skin regions within the measuring frame 49a so that they match the expected color range of skin tones. 
[0026] The image display device 34 enables a photographer to preview an image before capture and/or to view the 
last image captured. In addition, an optical viewfinder 28 is provided for previewing an image. Moreover, the CPU 30 
may employ the face detection algorithm to highlight faces within the viewed image if needed. For this purpose, a semi- 
transparent liquid crystal display (LCD) overlay 29 may be provided in the optical viewfinder 28; an LCD driver 29a 
activates certain areas of the LCD overlay 29 corresponding to one or more face locations in response to face location 
data from the CPU 30 (such an LCD mask is disclosed in U.S. Patent No. 5.103,254. which is incorporated herein by 
reference). Also, the CPU 30 can generate highlighted or outlined faces by driving the pattern generator 74 via the 
display interface 70 to display, e.g.. a box over a face in a viewing area shown on the display device 34. Furthermore, 
the faces can be marked by a photographer by moving the cursor 58 on the viewing area of the display device 34 so 
that, e.g., it overlies a face or it draws a box around a face. This could also be done through the LCD dnver 29a and 
the LCD overlay 29 in the optical viewfinder 28. 

[0027] Another advantage of the present invention is that data associated with the detection of faces in an .mage 
could be automatically recorded and included with or as an annotation of an image. This permits the automatic recording 
of significant subjects within a photographic record of events without requiring the annotation to be done by the pho- 
tographer at the time of image acquisition or at a later time. The detection of faces in the scene then opens the way 
for significant additional enhancements to the image capture event and to subsequent processing of the image. For 
example face detection will provide a convenient means of indexing images for later retrieval, for example by fetching 
images containing one or more people as subjects. Consequently, running the face detection algorithm provides face 
data corresponding to one or more parameters such as location, orientation, scale and pose of one or more of the 
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S^S^fS ? T ,3 m S HaVe b6en de,eCted ' 3 Simp ' e face reco S ni,io " al 9° rithm -n be app.ied to 
Ktenlrfy faces from a small gallery of training faces that the camera has previously captured with help fromine 

SSL- understood that a further embodiment of the invention is a hybrid camera which simultane 

su ha" eTps ITS 0 n a ,h Cene T/" " 'k^™ SUCh 35 lhe sensor 23, and a f! m 7e £l 

such as the APS film 25. In th.s embodiment, the CPU 30 processes the image data from the image sensor 23 to 

zsm p f re rr ce . of one or more faces in the scene - and <™ *** * jzz^tzsz 

on h?H P °i 8 ^ ° f ' he faC6S " ' he ima9S - Such face data «»« b * displayed to the user of le camera 

^ ,0 eVa ' Uate C3PtUred ima9e - ' f the faCe da,a < or ima 9 e > * oul ° suggesTa ^problem wi h 
the -captured « mage, the user would have the opportunity to recapture the image on another frame o 2 2 25 
Addrtiona.ly. the face data cou.d be written on the magnetic ,ayer of the film medium 25 by activS oi 2 TrJ^ 9 

a f S , h ° Wn d . ia9ramS °' Fi9UreS 3 and 4 ' res P ective| y- *e camera operates firs, in a framing mode and 
Zllr a final ' ma9,n 9 mode - ea '" ™ode, the camera offers a number of automated features to S Jphotoa 
raphe . The photographer has the option of disabling the framing mode through the user interface adap e 50 fherebv 
d.sabl.ng acqu.s.bon of the framing image and going directly to the fina. imaging mode. * 

Framing Mode 

dltrt a „L r 1 I th6n perf ° rmS the faCe detection a '9° rithm 90 in step 120. by which it attempts to 

detect any faces « the framing image and indicate their locations to the photographer in the viewf.nder 28 or Z toe 
display dev.ce 3*More specifically, the face detection algorithm utilizes face training data ZZT^lZZ^l 
£2 to find face. If faces are detected in the decision block 130. then face location data is stored Nh^l5Jt?£ 

Sec ionT' r retUr " ,0 be9innin9 Via P3,h 132 and 5li ^ re P° se th * *=ene and JbwISSSTS 
detection or can choose ,n a manual decision block 134 to provide manual detection input to the camera Sh 

SUIT SenS,t ' Ve SCree " and s,ylus (not snown >- The ". a ""ed with knowledge of face presence and face 
; fra 7 9 ,ma9e ' the camera 1 0 * able to provide va.uable services to the photographed ^can be used 

S£ im39e - SUCh ^ inC ' Ude f ° CUS aSSiStanCe ' — - ' « d — S 

!!!?c 11 FOCUS aSsis u tance - Ma °y modem provide automatic focusing or user-designated focusina usino a 

« bloerf^H 8 T 35 3 V ° iCe inS,rUCti ° n) - AlternatiVely ' as shown in the aforementioned wEITSE 
can be performed w.thm a measuring frame that is set to include a face. In connection with the present inventl after 

In. !k auto-focusinfl system in the opUcal section 21 to select a particular focus detection area toat 

wH focus the .mage opt.ma.ly for the preponderance of the faces in the scene. (Alternatively tne fo^f «u W be^e 
opfimaiiy for the largest face in the scene, which is presumed to constitute the primary sublet ) 

men, h T? ^ determination - Tne camera 10 provides automatic exposure control and flash engage- 
ment througn t exposure conlro , algorjthm fl2 an(j flash ^ P mtemSn^M^SE 

used for both ambient and flash exposure. The exposure control functionality provided by this patent can be confined 
to. or we.ghted fo, a facia, area locked within the measuring window 4fla described in alorememioned 
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'809 patent. Since people are usually the most important subject in images in which they appear, it is reasonable to 
choose the exposure to optimize the appearance of the faces of people, unless directed otherwise by the photographer. 
After performing face detection on the framing image, the camera will use utilize its auto-exposure algorithm 82 to set 
image exposure optimally in a step 1 60 for the detection area corresponding to the preponderance of the faces in the 
5 scene. (Alternatively, the exposure could be set optimally for the largest face in the scene, which is presumed to con- 
stitute the primary subject.) Similarly, the exposure control algorithm 82 will determine whether to fire the flash 27 based 
on its assessment of the adequacy of the illumination of faces in the scene. If the illumination is inadequate, the flash 
control algorithm 84 will activate and control the flash unit 27 in a step 170. 

[0033] Composition aids. The face detecting camera 10 provides a composition-assistance mode in step 180 in 
10 which composition advice is provided to the photographer. Many consumer photographs suffer from poor image com- 
position from an aesthetic point of view. Along with improper focus and exposure, poor composition is probably a 
leading cause of dissatisfaction with consumer image prints. A number of heuristic "rules-of-thumb" have become 
widely accepted as good principles of composition that result in pleasing images. For example, a small main subject 
frequently makes for an uninteresting print. Also, the "rule of thirds" calls for the main subject to be placed at roughly 
15 the one-third point in the image, either vertically, horizontally, or both. Such principles are discussed in detail in Grill, 
T. and Scanlon, M., Photographic Composition, Amphoto Books, 1990. 

[0034] The face detecting camera 10 provides the composition-assistance mode 180 in which, based on the results 
of face detection in the framing image, a composition algorithm 92 is enabled to generate composition suggestions 
that appear in the viewfinder or the display device 34. The composition algorithm follows the steps expressed in Figure 

20 5, although it should be clear that other composition elements could be examined (such as described in the Grill and 
Scanlon reference). One aid compares the area of the largest face detected to the overall image area (step 181). If 
the comparison exceeds a threshold (step 182) indicating the faces are too small, the display 34 suggests that the 
camera be moved closer to the main subject (step 183). A second aid compares (step 184) centroids of faces to grid 
lines as shown in Figure 6 that are representative of the rule of thirds, namely, positions in the image where principal 

25 subject matter tends to most pleasing (which is described in more detail in the Grill and Scanlon reference, page 22). 
If faces are substantially off the grid lines (step 185), then the display 34 suggests placing the main subject according 
to the rule of thirds to achieve a more pleasing image (step 186). A third aid locates faces intersecting image borders 
(step 187). If a threshold indicates that a substantial amount of the face is cut off by the camera aperture (step 188) 
then the display 34 is set to alert the photographer (step 189). A fourth aid relates the centroids of the faces to a 

30 horizontal line (step 190). If the faces seem to lie along a common horizontal line (step 192) the display 34 suggests 
that the heights of faces in an image of groups of people be varied, rather than aligned horizontally, to produce a more 
interesting image. The illustration in Figure 5 is meant to be generally illustrative of such an algorithm and other com- 
position principles such as described in the aforementioned Grill and Scanlon reference, which is incorporated herein 
by reference, may be implemented in a similar fashion. 

35 

Final Image Mode 

[0035] Immediately after capture and processing of the framing image as shown in Figure 3, the camera is ready to 
acquire the final image as shown in Figure 4, having provided the photographer with the aids mentioned in steps 150 

40 - 1 80 as described in the previous section. The initial steps 200-240 shown in Figure 4 are identical to similarly identified 
steps 100-140 in Figure 3, and therefore will not be further described. Additionally, in Figure 4, further aids operate 
directly on the final image. As mentioned before, the photographer may choose to operate only with a final image 
(eliminating the framing image) if only the second group of aids is desired. Alternatively, if the framing image seen on 
the display device 34 was deemed satisfactory to the user, it can be saved as the permanent image. In either case, 

45 several services are provided as part of the final image mode, including optimal color balance, red eye notification and 
correction, orientation marking and face labeling, as follows. 

[0036] Optimal color balance. While the human visual system demonstrates marvelous ability to maintain percep- 
tual constancy of colors across different scene conditions, neither analogue nor digital cameras possess the same 
capability. For example, to the human eye, the color of an object appears the same whether the object is viewed in 

50 sunlight, sky-light, or tungsten light; whereas these three scene conditions, when captured on a single type of photo- 
graphic film, will necessarily lead to the reproduction of very different colors. Therefore, it is customary to apply color 
balancing algorithms (CBAs) to captured images prior to printing or other display of the images. The current state-of- 
the-art of automated color balance algorithms seeks mainly to compensate for the most prominent scene illuminant. 
[0037] A better job of color balancing an image can be performed by taking into account some understanding of the 

55 nature of the objects in the scene, and their relative importance. If an entire scene be reproduced correctly, and yet 
the color of faces is noticeably wrong, then the reproduced image will not be acceptable. People are very sensitive to 
incorrect reproductions of skin tones, although the specifics of perceived offenses of reproduction vary from culture to 
culture. The variations in skin tones among different persons and ethnic groups can be statistically categorized and 
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understood. Furthermore, it fortuitously happens that the natural variations of skin colors and the offensive errors in 

me^on unacceptable reproduction errors primarily concern the green-magenta di- 

5 Snt waTlf ontlh^' 6860 ' 6 ,0Ca,i ° n °' ' 3CeS 3 SCene Can ,ead l ° co.or balancing in two 

d tferent ways If only global .mage correction is available (as in optical printing of analogue images) then the estimate 

dltecSno^r adjUS,6d S ° 33 l ° r6SUlt 3 P,eaSi " 9 renderi "9 °' the skin tones 

TZ7^ZS^?T 6 ' 0Cati0n ^ SaeS °' faC6S " ,he m39 " e,ic *~ °' »• anafogue film med urn 25 
10 TOtZ n li I! P h0, ° n "' sn, "9 equipment to optimize the color balance for proper reproduction of skin tones on 
th eface.On th e other hand, if a digital processing step is availab^ 

o ,ain e rt 9 ^ nSid : ra, : 0nS 0f illumina,ion - ™* * the best possible scenario, leading to a be«wZl2 

" ? T? 1 mea " S ' bCCaUSe b ° ,h ,he Prmar * sub * cts <P e °P |e > and the background regions can 
pleasingly reproduced. In either case, the camera 10 utilizes its color balance algorithm 88 in a face pSential cor 

« sxr, ™sz izzsrjsz baiance for r imase based at - in " art uiii'isEtr^ 

specially the CPU 30 interacts with the measuring frame 4g a generated by the tracking stage 4g to collect color 

SS!Tm h de,eCte t faC6(S) and ,h6n 10 wei 9 ht the c °'° r bala ™e algorithm 88 for the facia, area 
[0039] Red eye not.fication and correction. A red-eye detection algorithm 80. such as the one disclosed in com 

sZtTaaaTns, the fi'T T "** " kmmM h ~ ! * is ™ in 

eye alorrthm 8u Z\2 ? P 7^ ^ ° f 3 f3Ce iS USed as addi,iona ' » *• red- 

Jnforln h ^elp prevent false posit.ve errors. A pair of detected red-eyes should be corroborated by the re- 

to Z Th ™ ' Pr K eS6nCe ' The eXiS,6nCe ° f red eye C3n 3150 be P™ ided "y the red eye detectZgomhm 

tS^SS^S^ ^ Can d f 9na,e an aPPf0Pria,e W3ming ,he disp,a " device «■ After receded 
eye notification the photographer may choose to obtain another image. Or, the automatic red-eye correction algorithm 

fOoVo 6 o . ^ rem ° Ve ,! he ° ffenSiVe r6d hi9h,i9h,S in the e ^ if t^ camera 10 is a digital camera ' 
10040] Orientation marking. Many consumer photo-finishing orders are now returned with an index print of small 
versions of each image in a sequence. The utility of the index print is diminished if the image are nofa I p' Id inThe 

TsZTZT'T™- Pr6SenCe ° f ^ " a " im39e PTOvides 3 P° werful cue - 1° «s proper o je Son For 
instance, the facial dimensions can be separated into their principle axes, and the longest axis can be taken as the 
up-down axis then one of the face detection a.gori.hms to be described can dWnfluine^^^SZi; 

ESTX S£ The . mai r o°n ,aC6S Wi " 66 UPri9h * " C ' 0Se ,0 Upri9ht ' in tbe -nsXvera S 
orientation. The face detection algorithm 90 ,n the face detecting camera 10 will determine orientation and taq each 

captured image on the image storage device 32 (or 25) with a notation of the proper orientation as sLestea bv tS 
face orientation detected in the orientation step 280 eniauon as suggested by the 

L 0 c«irom a a C ,m a a b , e ' i '!, 9 ' °"u ^ 3 Simp,e f3ce recoanition <*n be app.ied to identify 

n he Sin dau* T"?" ^ °" ^ ^ he,p from the user and ^ored 

dass When a new i^nl 93,,ery f COU ' d K Contain the individuals in a family, for example, or children in a school 

established bv STfZ?E r f P * e Camera - f3C6S detected ' ,ne iden,i * of each 

layer oHhe £ 25 Sue ST,??" 1 ' ^ ? "* the im3ge h ,he di9ital s,oraae 32 or magnetic 

lrnl!l n f h / ent,ty ,nforma,,on flows w »h the image into photofinishing or subsequent computer 
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subsequently examined by component S to result in a final detection decision. 
The Component W 

[0044] Wu et al. published a face detection algorithm (hereinafter, as modified, the component W) that is well suited 
for inclusion in a digital camera (see Wu, H., Chen, Q. and Yachida, M., "Face Detection from Color Images Using a 
Fuzzy Pattern Matching Method", IEEE Trans. Pattern Analysis and Machine Intelligence, 21(6), 557-563, 1999, which 
is incorporated herein by reference). The algorithm is very fast and requires very small amounts of both program 
memory and trained state. The Component W is a kind of ad-hoc pattern recognizer that searches for image windows 
that seem likely to contain faces based on color characteristics. The method essentially looks for windows in which 
the central portion seems likely to contain skin, based on its color and shape; and surrounding regions (around the top 
and sides of the skin) that seem likely to contain hair, again based on color and shape. Since the method is based on 
color signals, it requires that the imagery on which it operates be encoded in a meaningful color metric. 
[0045] The component W has a training phase and a test phase. The training phase comprises the collection of skin 
and hair color distributions, and the gathering of shape models from suitable training examples. In the test phase, a 
window is scanned over the image and a complete range of scales and positions. The component W implicitly assumes 
that the upward orientation of the image is known, and that the faces are roughly aligned with the image orientation. 
This assumption could be relaxed by carrying out the entire face search several times-probably three, since the camera 
would not be used upside down - once for each possible image orientation. In the test phase, the algorithm applies 
the following steps once for each image window to be examined: 

1) Compute skin and hair probability maps. Comparison of each pixel in a digitized image with pre-determined 
probability tables of skin and hair colors, leading to a posteriori probability that the pixel represents human skin or 
hair. The probability tables must be collected off-line and stored in the camera. They are collected with the same 
imaging sensor as in the digital camera, using identical spectral sensitivities. 

2) Convert probabilities to estimated area fractions via non-linearity. Face shape models are built from training 
examples, also off-line. These models encode the likelihood of the occurrence of skin and hair colors in each cell 
of a rectangular grid overlaid on spatially normalized human faces in a small set of standard head poses. 

3) Perform fuzzy pattern matching with face shape models. A rectangular window is scanned to each pixel 
position of the image in turn, and a judgment is made as to whether the window contains a face. To accommodate 
faces of varying sizes, the scanning process is repeated with windows varying over a range of sizes. The judgment 
of whether a face is present in a window of the image is based on a fuzzy comparison between the pre-determined 
face shape models and the actual distribution of posteriori skin and hair probabilities in each cell of the window. 
The fuzzy comparison makes use of parameterized non-linearities, as described in the Wu et al. article, that are 
adjusted in a calibration stage in order to provide the best results. 

Each of these steps are now described in more detail after introducing the face shape models. It should also be un- 
derstood that extensive detail can be found by referring to the Wu et al. article. 

[0046] Shape models. The head shape models are low-resolution representations of the spatial distribution of skin 
and hair in typical face poses. There is one model for skin and one model for hair for each distinct pose. Each model 
consists of m x n cells (currently, m=12 and n=10), with each cell encoding the fraction of the cell that is occupied by 
skin (for skin models), or hair (for hair models) for typical heads in a given pose. An image window can be spatially 
corresponded with the cells in the shape models. Depending on the window resolution, a single pixel or a block of 
pixels may correspond to each model cell. The models were built using a set of training images to which affine trans- 
formations have been applied, in order to place the two eyes in standard positions. The spatially normalized images 
were then manually segmented into skin and hair regions, and the fractional cell occupancy, at the resolutions of the 
models, was computed. An example of the shape models for frontal and right semi-frontal poses is shown in Figures 
7A-D. The models are stored in the training database 72 (shown in Figure 1) with gray-level encoding of the occupancy 
fraction. 

[0047] Compute hair and skin probability. The objective at this point is to acquire probability distributions for skin 
and hair colors from training images. The goal is to obtain probability tables of the form P(skin | color) and P(hair | 
color). Instead of using the Farnsworth perceptually uniform color space as suggested in the Wu et al. article, the 
present invention uses (L,s,t) color space as a preferred color metric for distinguishing skin and hair regions, and 
therefore performs probability training and application in the (L.s.t) color metric, where L = c(R+G+B); s = a(R-b); t = 
b(R-2G+B); a, b and c are constants; and R, G and B are image values proportional to relative log scene exposure. 
This metric has proven to be an effective color space in which to perform image segmentation. While all three channels 
are used, the luminance channel is separated from the combined chrominance channels in the probability histograms. 
[0048] To gather skin color statistics, an annotated database including some 1800 images was used, each image 
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where the parameters a and b (different from those in the non-linearity mapping of the previous section) determine the 
shape of the comparison, and the / and M subscripted variables representee skin and hair probabilities from an image 
region and a model cell, respectively. Increasing b gives exponentially steeper penalties to differences between the 
shape model and the image window. In this embodiment, the values a=2 and 0=1 were selected after some experi- 
mentation. The similarity score for the entire image window is taken as the average similarity score over all cells of the 
shape model. A threshold can be applied to the similarity measure to identify face candidates detected by the Com- 
ponent W. 



Component S 

[0051] Complete details of the Schneiderman algorithm (hereinafter, component S) appear in Schneiderman, H. and 
Kanade, T., "Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition", Proc. CVPR 
1998, 45-51, which is incorporated herein by reference. The main steps of the method are outlined here in order to 
provide a self-contained description and to highlight differences and improvements with respect to the reference. The 
Component S implements a Bayesian classifier that performs maximum a posterior classification using a stored prob- 
ability distribution that approximates the conditional probability distribution P(face | image). The method is called Baye- 
sian because of the use of Bayes* theorem to convert the a priori measured training distribution P(image | face) into 
the posterior distribution in the presence of evidence from an image. The evidence consists of the pixel values in a 
spatial- and intensity-normalized image window. The use of Bayes* theorem is mandated by the observation that image 
evidence can be ambiguous. In some cases, objects and scenes not in the class of interest (i.e. faces, in this context) 
can give rise to image patterns that can be confused with class (=face) objects. Bayes* theorem requires the collection 
of representative non-class images, known as "world" images. The collection of world images proves to be the most 
difficult and critical process involved with training the algorithm. The difficulty lies in the fact that the world is very 
diverse. Faces are not diverse (at least when compared to the world), and so collection of training examples of faces 
is quite straightforward. This difficulty will be discussed at length in a following section on training. 
[0052] The simplifications made to the distribution 

P(f ace | image) (1) 



that are described herein change a huge, uncountable distribution into a very practical one. The goal is to arrive at a 
simplified distribution P(face | distilled-image-features), where the distilled image features can be counted up and 
grouped during training in, say, one million bins. A heuristic of training classifiers would indicate that two orders of 
magnitude more examples than bins are needed. Hence, 10 s examples might be required to populate 10 6 bins in a 
statistically meaningful way. It is eminently possible to collect that many examples, especially if we are permitted to 
generate some of them computationally, and if the "unit" of an example is something smaller than an entire human face. 
[0053] Simplifications are applied in the order listed here and are described in the sections that follow: 



(1) standardize face region size 

(2) decompose face region into subregions 

(3) ignore dependencies between subregions 

(4) project subregions to lower dimension representation using PCA 

(5) code projections using sparse coefficients 

(6) quantize sparse coefficients 

(7) decompose appearance and position 

(8) ignore position for uncommon patterns 

(9) vector quantize positional dependence for common patterns 

(10) apply (1)-(9) at multiple resolutions, assuming independence between resolutions 

1. Standardize face region size. Spatially normalized faces will be presented in a 56x56 pixel window. This 
simplification changes equation (1) into 
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P(face | region) (2) 
where region is exactly a rasterized vector of pixels from a 56x56 pixel image window 
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for subsequent processing. 

6.Quantize sparse coefficients. Further compression of subregion representation occurs through discrete quan- 
tization of the nine coefficients using a Lloyd-Max quantizier. This quantizier minimizes the mean-square quanti- 
zation error under the assumption of a Gaussian distribution of the independent variable. For common values of 
the number of quantization values, the bin breakpoints and the reconstruction levels of Lloyd-Max quantizers are 
tabulated in Lim. J., Two-Dimensional Signal and Image Processing, Prentice-Hall: New Jersey, 1990. To test the 
validity of the Gaussian distribution assumption, the actual distribution of the projection coefficients of the training 
set were collected, from which it was seen that the Gaussian assumption closely matches the actual distribution.. 

The choice of the number of sparse coefficients retained and the number of quantization levels allocated to 
each coefficient determines the number of possible quantization values that encode image subregions. Based on 
the choices of six prominent dimensions, with choices of 8, 4, or 2 quantization levels for each dimension, the 
algorithm as implemented can represent each subregion by one of approximately 1 ,000,000 numbers. These quan- 
tized numbers are somewhat inscrutably called M q1" values in the Schneiderman et al. reference. The number of 
possible q1 values is an algorithm sizing parameter referred to as H n p1 M in that reference. 

The compression advantage of this quantization scheme becomes clear when it is seen that 256 256 possible 
subregion patterns are encoded in 10 6 distinct numbers. In fact, it is possible to consider this quantization scheme 
as a form of image coding. Reconstruction of the image from its coding gives a sort of approximation to the original 
image. Figure 10 shows an original image and its reconstruction following PCA projection and sparse coding and 
quantization. More specifically, Figure 10(a) shows the original image, Figure 10(b) shows a reconstruction from 
projections of subregions into twelve dimensional principal component space and Figure 10(c) shows a recon- 
struction from sparse coded and quantized version of Figure 10(b). (Note that images (b) and (c) do not show all 
the encoded information. Rather, they show the reconstructions from the encoding with subregions aligned with a 
tiled grid of 56x56 face regions. Simultaneous encodings capture further image information as the subregions are 
offset relative to the region grid.) Following the quantization step, the probability expression (5) is further simplified 
to 



v bregi . 

n 



P(face\qh) ( 6 ) 



7. Decompose appearance and position. At this point in the chain of simplifications of the probability distribution, 
expression (6) is expanded to explicitly include both the pixel pattern of a subregion and its position within the face 
region. Equation (6) is replaced with 



Psvbregions 



where each subregion is now represented by its quantization value and its position within the face region. Inter- 
pretation of expression (7) intuitively leads to thoughts like the following: eye-like patterns ought to occur in face 
regions only in the subregions likely to contain eyes. 

8. Ignore position for uncommon patterns. Given that 1,000,000 quantization levels and 196 positions are 
possible for each subregion, further simplifications of expression (7) must occur. Two more simplifications are 
made to this expression. First, a decision is taken to encode the positional dependence of only the most commonly 
occurring q1 patterns. To this end, a large sorting step orders the q 1 patterns by decreasing frequency of occurrence 
in the training set. All q1 patterns that sort below an occurrence threshold will have their positional dependence 
replaced by a uniform positional distribution. The number of q1 patterns whose positional distribution is to be 
explicitly learned during training is an algorithm sizing parameter referred to as n n est n in the Schneiderman refer- 
ence. For the uncommon patterns, expression (7) becomes 
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npos ( 8 ) 



where npos=196 is the number of possible subregion positions 

For each vector x 

Find the closest current pattern center 

Calculate the distance d between x and the closest center. The sum 
squared error (SSE) metric is used. 
If d<threshold 

Add x to cluster; update cluster center 

else 

Seed new cluster with x 

ft subregion* 

IJPCfacelqlpos;) (p) 
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II YlP(face\qli ) 00) 

A typical example would be that of a single face captured at nmags=Z levels of pixel resolution. At each resolution, 
the eyes must reside at standard positions. 

[0054] Full form of simplified probability distribution. Gathering together expressions (8) and (10), and applying 
Bayes* theorem to relate prior probabilities gathered during training to the posterior probabilities in these expressions 
leads to the full form (11) of the estimated likelihood of face presence in an image region. Details of the complete 
derivation of this equation appear in the Schneiderman reference. 



nmogj npte P(pos'\ql j , face)P(face) 



-P(face) + 
npos npos 



(11) 



In this expression, P(face) and P(face) represent the prior probabilities that an image region either does or does not 
contain a face. In the absence of this knowledge, uniform priors equal to % are used, leading to a further simplification 
in the above expression (11). This assumption about prior probabilities does not affect the performance of the algorithm 
when used for pattern recognition of faces. Rather, it results in the presence of a scaling factor that must be taken into 
account when interpreting the algorithm output as a probability value. 

[0055] Training steps - Phase I. While actual training of algorithm S involves a number of discrete steps, the training 
divides naturally into two major phases. The goal of the first phase is to obtain specific parameters of the quantization 
of face subregions. The initial step is to capture the covariance matrix and then principal components of the subregions 
from the training set. As part of this step, following extraction of the principal components, another pass is made through 
ail the training subregions to gather the statistics of their projections unto those twelve principal dimensions. The pro- 
jection data are then analyzed. The projection statistics are fed back into the training program to enable optimal design 
of the Lloyd-Max quantizer. Since the variation of face patterns is quite large when considered across different scales 
of resolution, this process of extracting principal components and the statistical distribution of the training data along 
those components must be repeated for each image resolution. 

[0056] Training steps - Phase II. The second phase of training starts by passing through the training set and per- 
forming the quantization of each subregion of each face example. As mentioned above, the training set can be expanded 
by creating slightly perturbed versions of each training exemplar. The frequency with which quantized values appear 
is counted in a histogram having roughly 1 ,000,000 bins. Simultaneously, subregion positions at which each quantized 
value occurs are accumulated. A sort operation arranges the quantization frequency histogram in decreasing order of 
occurrence count. For the nest most frequency quantized patterns, the positional distributions enter into the vector 
quantization algorithm. Following vector quantization, only n q2 seminal positional distributions are retained, and each 
of the n est frequent quantization values will have a positional distribution approximated by the retained distributions 
[0057] Applying the face detector. To use the trained face detection algorithm at test time, the computation of 
expression (11) must be applied to an image region on which spatial and intensity normalization have been conducted. 
Three different resolution versions of each candidate face region are required. The quantization value for each subre- 
gion is computed, and the various probability terms in expression (11) are extracted from the probability tables created 
during algorithm training. 

[0058] To use expression (11) for face detection, a probability threshold must be selected. When the posterior prob- 
ability exceeds the threshold, then face detection has occurred. After the algorithm training process has been com- 
pleted, the threshold is determined by studying the classification performance of the algorithm when applied to a ver- 
ification set of face and non-face images. The threshold is set for optimal performance on the verification set, taking 
into account the relative importance of false positive and false negative errors. 
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1. A digital camera for capturing an image of a scene, said digital camera comprising: 
a capture section for capturing an image and producing image data- 
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a storage medium for storing the image data; and 
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11. A digital camera for capturing an image of a scene, said digital camera comprising: 
a capture section for capturing an image and producing image data; 

an electronic processing section for processing the image data to determine the presence of one or more 

5 faces in the scene; 

face data means associated with the processing section for generating face data corresponding to at least 

one of the location, orientation, scale and pose of at least one of the faces in the image; 

a composition algorithm associated with the processing section for processing the face data and generating 

composition suggestions for a user of the digital camera; and 
10 a display device for displaying the composition suggestions to the user. 

12 A digital camera as claimed in claim 11 wherein the composition suggestions include at least one of (a) an indication 
that a main subject is too small in the image, (b) an indication that following the law of thirds will lead to a more 
pleasing composition, (c) an indication that one or faces have been cut off in the image, and (c) an indication that 
15 a horizontal alignment of subjects should be avoided in the image. 

13. A digital camera for capturing an image of a scene, said digital camera comprising: 

a capture section for capturing an image and producing image data; 
20 an electronic processing section for processing the image data to determine the presence of one or more 

faces in the scene and generating face data therefrom; 

an orientation algorithm associated with the processing section for generating orientation data indicating ori- 
entation of the image based on the orientation of at least one of the faces in the image; 
a storage medium for storing the image data; and 
25 recording means associated with the processing section for recording the orientation data with the image data 

on the storage medium. 

14. A digital camera for capturing an image of a scene, said digital camera comprising: 

30 a capture section for capturing an image and producing image data; 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; 

a red eye detection algorithm associated with the electronic processing section for generating red eye signals 
indicating the presence of red eye in one or more of the faces; and 
35 a display device responsive to the red eye signals for displaying a red eye warning to a user of the digital 

camera. 

15. The digital camera as claimed in claim 14 further comprising red eye correction means responsive to the red eye 
signals for correcting the red eye in said one or more faces. 

16. A hybrid camera for capturing an image of a scene on both an electronic medium and a film medium having a 
magnetic layer, said hybrid camera comprising: 

an image capture section for capturing an image with an image sensor and producing image data; 
45 means for capturing the image on the film medium; 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; 

face data means associated with the electronic processing section for generating face data corresponding to 
at least one of the location, scale and pose of at least one of the faces in the image; and 
50 means for writing the face data on the magnetic layer of the film medium. 

17. The hybrid camera as claimed in claim 16 further comprising: 

a storage medium for storing the image data; and 
55 recording means associated with the processing section for recording the face data with the image data on 

the storage medium. 

18. The hybrid camera as claimed in claim 16 wherein the electronic processing section provides an indication that 
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one or more faces have been detected. 
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20. The hybrid camera as claimed in claim 16 wherein the electronic processing section further includes a face rec- 
records the facial ident.t.es of known faces on the magnetic layer of the film medium. 

21. The hybrid camera as claimed in claim 16 further including: 
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ence of one or more faces for optimally exposing the film medium for at least one of the faces in the , 



! scene. 
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controls act.vat.on of the flash unit in order to optimize exposure for at least one of the faces in the 7 



) scene. 



24 ' cam^co a m m pSng: CaP,Ufin9 " ^ " * °" " e ' eCtr ° niC 3nd ° n 3 film medium - said 

an image capture section for capturing an image with an image sensor and producing image data- 
means for capturing the image on a film medium- 
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a display device for displaying the composition suggestions to the user. 
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.nd.cat.on that a ma.n subject .s too small in the image, (b) an indication that following the law of thirds will lead to 

that a horizontal al.gnment of subjects should be avoided in the image. 'noication 
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an image capture section for capturing an image with an image sensor and producing image data- 
means for captunng an image and producing image data- 
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27. A method for determining the presence of a face from image data, said method comprising the steps of: 
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(b) operating on the face candidate regions with a second algorithm using a pattern matching technique to 
represent each face candidate region of the image and thereby confirm a facial presence in the region, whereby 
the combination of these algorithms provides higher performance in terms of detection levels than either al- 
gorithm individually. 

28. A method for verifying the presence of red eye in an image, said method comprising the steps of: 

providing image data representative of the image; 

processing the image data with a red eye detection algorithm for generating red eye signals indicating the 
presence of red eye in the image; and 

corroborating the existence of red eye by utilizing a face detection algorithm to verify that the red eye signals 
correspond to the presence of one or more faces in the image. 

29. The method as claimed in claim 28 further comprising the step of correcting the red eye in the image. 
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